A Robust-Equitable Copula Dependence Measure for Feature Selection

نویسندگان

  • Yale Chang
  • Yi Li
  • A. Adam Ding
  • Jennifer G. Dy
چکیده

Feature selection aims to select relevant features to improve the performance of predictors. Many feature selection methods depend on the choice of dependence measures. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable; i.e., it should treat linear and nonlinear relationships equally. In this paper, we introduce the concept of robust-equitability and identify a robust-equitable dependence measure robust copula dependence (RCD). This measure has the following advantages compared to existing dependence measures: it is robust to different relationship forms and robust to unequal sample sizes of different features. In contrast, existing dependence measures cannot take these factors into account simultaneously. Experiments on synthetic and realworld datasets confirm our theoretical analysis, and illustrate its advantage in feature selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Robust-Equitable Measure for Feature Ranking and Selection

In many applications, not all the features used to represent data samples are important. Often only a few features are relevant for the prediction task. The choice of dependence measures often affect the final result of many feature selection methods. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable, a concept pr...

متن کامل

Supplementary Material: A Robust-Equitable Copula Dependence Measure for Feature Selection

For simplicity, we focus on the bivariate case (X and Y are each one-dimensional variables). The extension to proof in multivariate case is straight forward. We first work on the mutual information, then show the similar arguments on the copula distances. To prove the theorem, we use Le Cam [1973]’s method to find the lower bound on the minimax risk of the estimating mutual information MI . To ...

متن کامل

Copula-based Kernel Dependency Measures

The paper presents a new copula based method for measuring dependence between random variables. Our approach extends the Maximum Mean Discrepancy to the copula of the joint distribution. We prove that this approach has several advantageous properties. Similarly to Shannon mutual information, the proposed dependence measure is invariant to any strictly increasing transformation of the marginal v...

متن کامل

A Copula Statistic for Measuring Nonlinear Dependence with Application to Feature Selection in Machine Learning

Feature selection in machine learning aims to find out the best subset of variables from the input that reduces the computation requirement and improves the predictor performance. This paper introduces a new index based on empirical copulas, termed as the Copula Statistic (CoS) to assess the strength of statistical dependence and for testing statistical independence. It shows that this test exh...

متن کامل

GJR-Copula-CVaR Model for Portfolio Optimization: Evidence for Emerging Stock Markets

Abstract T his paper empirically examines the impact of dependence structure between the assets on the portfolio optimization, composed of Tehran Stock Exchange Price Index and Borsa Istanbul 100 Index. In this regard, the method of the Copula family functions is proposed as powerful and flexible tool to determine the structure of dependence. Finally, the impact of the dep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016